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The fermion determinant is a highly non-local object and its logarithm is an extensive quantity. 
For these reasons it is widely believed that the determinant cannot be treated in acceptance steps 
of gauge link configurations that differ in a large fraction of the links. However, for exact fac- 
torisations of the determinant that separate the ultraviolet from the infra-red modes of the Dirac 
operator it is known that the latter show less variation under changes of the gauge field compared 
to the former. Using a factorisation based on recursive domain decomposition allows for a hier- 
archical algorithm that starts with pure gauge updates of the links within the domains and ends 
after a number of filters with a global acceptance step. We find that the global acceptance rate is 
high on moderate lattice sizes. Whether this type of algorithm can help in curing the problem of 
critical slowing down is presently under study. 
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1. Introduction 

In this contribution we analyse the acceptance rate of global acceptance steps with the fermion 
determinant ^ in lattice QCD. The type of algorithm detailed in the following sections is highly 
flexible with respect to the fermion and gauge action used, because complicated and expensive 
force computations are avoided. The proposed changes of the gauge configuration come from a 
hierarchical filter (section ^) that separates short and long distance physics. Hierarchical filters 
based on approximations of the determinant with increasing accuracy were introduced and tested 
in The advantage of our approach, based on recursive domain decomposition, is that it allows 
for a decoupling of updates within the domains and thus parallel domain-wise acceptance steps in 
the filter (section ||]). In [^] a serial hierarchical filter of increasing block size was proposed, but 
never tested. 

The determinant in the global acceptance step has to be treated stochastically, but the exact 
acceptance rate of the gauge link updates can nevertheless be determined (section |]) and its volume 
dependence studied (section ||). We demonstrate that for a lattice of size (0.8 fm) 4 the exact global 
acceptance rate is > 60% (section ^). 

2. Hierarchical acceptance steps 

Let P(s) be the desired distribution of the states s of a system. Suppose a process that pro- 
poses a new state s' with probability Pq(s' <— s) and fulfils detailed balance with respect to Pq(s). 
A process with fixed point distribution P(s) is then obtained by the iteration of a proposal with 
subsequent Metropolis acceptance step [Q] 

0) Propose s 1 according to Pq(s' <— s) 

1) Pacc(s'^)=min<h, [ 



P(s)Po( S >) 

This hierarchy of a proposal step and an acceptance step, that in combination have the correct fixed 
point distribution, can easily be generalized to an arbitrary number of acceptance steps. The result 
of the first acceptance step 1) is then interpreted as the proposal for a second acceptance step 2) 
and so on. If the target distribution P(s) factorises into n+ 1 parts 

P(s)=P Q (s)P l (s)P 2 (s)...P n (s), (2.2) 

the resulting hierarchical acceptance steps take the form 

0) Propose s' according to Pq(s' <— s) 

1) P w (j /^ j)=imn | l5 ^ 



fl&V w)=min(l,^h (2.3) 



Pn(s) 

In the context of lattice QCD it is plausible to assume Pj(s) <=< exp(— S;(j)) and thus Pi(s') /Pi(s) = 
exp(— Aj(j', s)) with At(s',s) = Si(s') —Sj(s). For Gaussian distributed Aj(s',s) with variance of 
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the acceptance rate in step i) is (TO] 



P®) =erfc( Jaf/% \ . (2.4) 



The acceptance rates might be optimised by parametrising and tuning the factorisation ( |2.2[ ) [Qj. 

In 2-fiavour lattice QCD a simple two-step algorithm would consist of some update of the 
gauge link configuration U according to the gauge action alone, e.g. Po(U) °< exp(— Sc(U)), and 
an acceptance step with the fermion determinant 

(V <- U) = min 1 1 , det 1 . (2.5) 

If the proposal is a 1-link change and the Dirac operator D involves only nearest neighbour cou- 
plings it is easy to show that the acceptance step requires 0(1) inversions. An ergodic algorithm 
is then obtained by sweeps through the lattice. Thus the cost of such an algorithm would scale at 
least like V 2 [§. 

If, on the other hand, a finite fraction oc y of the links is updated for the proposal, the distri- 
bution Pi might be written as P\(U) oc exp(ln(det D^D)). Since the logarithm of the determinant 
of D^D is an extensive quantity the variance of the distribution of Ai(U',U) = ln(det D''D') — 
ln(det D'D) is o 2 V . From ( |2.4[ ) one concludes that for V — y °° the acceptance rate decreases 
exponentially with the volume. 

From the preceding discussion it is obvious that such two-step algorithms will not be efficient 
for large lattices. Indeed numerical experiments show that for lattices larger than ~ (0.2 fm) 4 
(where all links are updated) the acceptance rate quickly becomes less than a percent. However, 
in the context of low mode reweighting the fluctuations of the determinant of dJ 0V/ D\ 0W , where 
Aow is a restriction of D to its low modes, are found to depend only mildly on the volume []7|]. 
The explanation for this observation might be the fact that the fluctuations of the small eigenvalues 
of D^D decrease like l/V Thus, given a factorisation of the determinant that separates low 
(infra-red IR) and high (ultraviolet UV) modes 

det(D) = det(Duv) • • • det(Di R ) , (2.6) 

a hierarchy of acceptance steps can be constructed, where the large fluctuations of the UV modes 
go through a set of filters (acceptance steps) which are more and more dominated by the IR modes: 

0) Pq UV short distance local cheap 

n) P n IR long distance global expensive 

This hierarchy of modes may induce also a hierarchy of costs since it is the low modes that cause 
the most cost in lattice QCD. Furthermore the factorisation should be exact and the terms simple 
to compute. Factorisations that realise these conditions are already used to speed-up the HMC 



algorithm, i.e. in the context of mass-preconditioning ||9|] and domain-decomposition Q1Q ]. Only 



the latter also allows for a decoupling of local updates and will be discussed in the following. 
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3. Domain decomposition 

Domain decomposition was introduced in lattice QCD in ^ and in [ 1C ] the resulting factori- 
sation of the fermion determinant was used to separate short distance and long distance physics in 
the HMC algorithm. 

Suppose a decomposition 'tf of the lattice in non-overlapping blocks b G (cf. fig. [j] for a 
2-dimensional visualisation). The lattice sites are labelled such that the sites belonging to the first 
black block come first, then the second black block and after the last black block the first white 
block and so on. The Dirac operator can then be written in as 

D= /D bb D bw \ (31) 
y iAvb iAvw J 

where D bb (O w ) is a block-diagonal matrix with the black (white) block Dirac operators Dj, on 
the diagonal. The block Dirac operators Dj, fulfil Dirichlet boundary conditions and therefore are 
dominated by short distance physics (if the blocks are small enough). The matrices D bw and D wb 
contain the block interaction terms. The form (|3.1l) induces a factorisation of the determinant 



det(Z)) = H det(D fo ) det(D) , D = 1 - D^D bw D^D wh , (3.2) 



where D is the Schur complement of the decomposition (3.1) and contains block interactions, 
i.e. the long distance physics. A natural separation scale is given by the inverse block size 1 /Lj,. In 
the context of the domain decompositioned HMC the average force associated with the Schur com- 
plement is an order of magnitude smaller than the force associated with the block Dirac operators 
[JlOtl - This indicates that the fluctuations of the determinant of the Schur complement are smaller 



than that of the block determinants. Furthermore the factorisation ( |3.2| ) can be iterated using a 
recursive domain decomposition 

det(D fc )= n det(D*)det(A,). (3.3) 

In the case of one level of recursion the hierarchy of acceptance steps is given by 

0) Update active links on all blocks (e.g. heat-bath/over-relaxation) 



1 ) Pic' (b,b') = min { 1 , det } , Vb,\/b' G % 

D y Dy 



2) (b) = min <^ 1 , det -jL-k } , Vb G V 

3) PS (global) = min 1 1 , det -^-r- 

The set of active links that is updated in step 0) is chosen such that the block acceptance steps 
decouple. In the case of Wilson fermions with clover term these are the links that have at most one 
endpoint on the boundary of a block (white points in fig. [[]). If the smallest blocks, b', consist of not 
more than ~ 6 4 lattice points, the determinant ratios in step 1) can be efficiently computed exactly 
by sparse LU-decomposition. The Schur complements of steps 2) and 3) are usually too large to 
be computed exactly and thus have to be treated stochastically. 
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Figure 1: Block decomposition of a 2-dimensional lattice. The blocks are coloured like a checker board. 
Picture taken from JlQ[ ] 

4. Stochastic acceptance step 



The determinants of ratios of block and the global Schur complements in ( [3.4] ) are replaced 
by stochastic estimators. Dropping the subscripts and setting M = D f l D we replace the exact 
acceptance step by a stochastic acceptance step 



rnin{l,det(M t M)} ^minl^e-^'^- 1 ^} , 



(4.1) 



where T] is a complex Gaussian noise vector that is updated before each acceptance step. Such an 
algorithm can be shown to fulfil detailed balance and yield an acceptance rate that is bounded from 
above by the one of the exact acceptance step [[|, ||]. 

Stochastic acceptance steps will only work if the variance of the estimator can be controlled 
(c.f. eq. (2A)). The eigenvalues X of M'M are real and the bulk of them follows a log-normal distri- 
bution with mean zero and variance It is easy to see that the estimator variance is not defined 
if the smallest eigenvalues of M^M is < 0.5 [11] and that eigenvalues A = 1 do not contribute to 
it. Using a simple model one can further show that the estimator variance is proportional to the 
number Ni of eigenvalues that are not one and the variance o - ^ « of the log-normal distribution of 



the eigenvalues [|12|, |5j] . 

The fact that we are estimating the determinant ratio of the Schur complement and not the 
full Dirac operator helps, since it acts non-trivially only on a subset of the boundary points (white 
points in fig. |]) JTo|], i.e. it reduces N[. The variance ex?-, of the eigenvalue distribution can be 
reduced by relative gauge fixing the old and the proposed link configuration [[j]]. To ensure A > 0.5 
and to balance cost and acceptance rate these measures turn out not to be sufficient. Therefore 
a parametrisation of the proposed change in the gauge field is introduced by a sequence Uj, i = 
0,...,N with Uq = U,Un = U', inducing a factorisation 



N-l 



det (M f M) = FJ det (Af/M,-; 



(4.2) 



Each factor is then replaced by a stochastic estimator with an independent noise vector. The cost 
is then one inversion for each factor. We observe that if the sequence of link configurations fulfils 
| \Ui — Ui + \ \\<x \/N ,Mi <N the variance and thus the variance of the estimator of this factor 
is reduced by l/N 2 . 1 The variance of the product of the N factors is then reduced by a factor l/N. 
In the limit N — > °° the exact determinant det(M^M) is obtained and also the exact variance of the 



For this statement to hold, the plaquette should not change too much along the sequence of link configurations | 
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Figure 2: Exact acceptance rate for volumes (in lat- Figure 3: Histogram of the plaquette for the 3-step 
tice units) ranging from 8 4 to 16 4 . A linear fit of the algorithm and a HMC run at the same parameters 
variance (inserted in (2.4 )) describes the data well. (cf. section n) and using the same bins. 



determinant due to the gauge fluctuations, i.e. the variance in ( |2.4| ). This provides the possibility to 
analyse the exact acceptance rate as a function of the volume. 

5. Exact acceptance rate 

We consider a 3-step algorithm that first updates all active links of a domain decomposition 
with block size 4 4 , which amounts to about 10% of all links. This change is then block-wise 
accepted with the exact determinant ratio of the block Dirac operators. After some iterations of 
the combination of step one and two, the proposal is passed to the global acceptance step with the 
Schur complement of the 4 4 block decomposition. 

The gauge action we use is the Wilson plaquette action and the Dirac operator is the plain 
Wilson-Dirac operator. The lattice parameters are j3 = 5.8 and K = 0.15462. Theses values cor- 
responds to a lattice spacing of 0.05 fm and a pion mass of 400 MeV (both determined on a large 
lattice in [§]). On lattices with 8 4 up to 16 4 points we determine the variance of the stochastic es- 



timator in the global step for different values of N in eq. (42) and extrapolate in 1 /N to zero, thus 
obtaining an estimate for the exact variance of (V) of the logarithm of the determinant as a function 
of the volume. Via (2_4) the exact acceptance rate can also be determined from the variance. 2 

The exact acceptance rates as determined from the variances are plotted in fig. || together with 
the result of a linear fit to of (V ) constrained to zero. The 3-step algorithm of this section shows 
a good acceptance for lattices up to 16 2 x 8 2 . This is the region where the error function can be 
approximated by a Taylor expansion with a linear leading term. The data supports a linear growth 
of the fluctuations with the volume. 

6. Realistic run 

The algorithm of the last section is characterised by the separation scale of low and high modes 
given by the inverse block size 1 /Lb, with = 4a. From fig. ^ we conclude that this scale is too 
large for a lattice with the same parameters and 16 4 points. Indeed switching to a block size of 8 4 



"We tested the (tacitly assumed) validity of the Gaussian model for finite values of N. 
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and updating the links in a 6 4 block inside each 8 4 block (which amounts to 8% of all links) the 
global acceptance rate is raised to 61% (at fixed N = 21 and 45% block acceptance). A histogram 
of the plaquette is shown in fig. ||. The plot comprises the data of 1000 global acceptance steps 
after thermalisation. For comparison the histogram of the plaquette obtained from a HMC run with 
the same parameters is plotted as well. The histograms have to agree within statistical errors and 
we checked this for the mean value and the variance. 

7. Conclusion and outlook 

We have developed an algorithm that is based on a hierarchical filter of acceptance steps. It 
deploys recursive domain decomposition to separate short and long distance physics. On lattices 
with up to 16 4 lattice points (or size (0.8 fm) 4 ) the algorithm has good global acceptance > 50%. 

We have already extended this study to larger volumes and other fermion actions including the 
clover term and/or HYP-smeared links The application of the techniques presented in sections 
H| and || to mass reweighting [13, 14] is promising and will be pursued. 
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